Introduction

R has a full library of tools for working with spatial data. This includes tools for both vector and raster data, as well as interfacing with data from other sources (like ArcGIS) and making maps.

There tons of libraries in the R universe that provide functionality to work with spatial data. You can find an overview of these packages here. In this document, I will introduce four highly popular libraries:

Moreover, I will point you to a recently written package called sf (for Simple Features). It follows the tidyverse philosphy and a pretty interesting improvement on sp/rgdal/rgeos. It may replace these packages in the future.

# load package for vector data
library(sp)
library(rgdal)

# load package for raster data
library(raster)

# load package for working with geometric options
library(rgeos)

# new package for vector data; will replace sp, rgdal, and rgeos for vector data
library(sf)

# data management and plotting
library(tidyverse)

1. Spatial Data Types in R

First, I will introduce you to the two types of spatial data you will encounter in R: vector data and raster data.

1.1. Vector Data

Almost all spatial vector data structures in R are based on the sp package. Even other libraries that may seem independent are usually built on top of sp, even if you can’t see it.

The sp package has three main types of spatial data we’ll work with: points, lines, and polygons. There are some differences between these different types, but they are all very similar.

To help you understand what these data structures are like, in this section we’ll create some spatial data from scratch. This is probably not how you’ll work with data in the future – most of the time you just import spatial data from a source – but this exercise will help give you a good foundation and help you know how to troubleshoot problems in the future.

There are three basic steps to creating spatial data by hand:

  • Create geometric objects (points, lines, or polygons)
  • Convert those geometric objects to Spatial* objects (* stands for Points, Lines, or Polygons)
    • Geometric objects live in an abstract space (the x-y plane). To make them spatial objects, we also need to include information on how those x-y coordinates relate the places in the real world using a Coordinate Reference System (CRS).
  • (Optional:) Add a data frame with attribute data, which will turn your Spatial* object into a Spatial*DataFrame object.

1.1.1 SpatialPoints: Your First Spatial* Object!

Points are the most basic geometric shape, so we begin by building a SpatialPoints object.

Make Points.

A points is defined by a pair of x-y coordiantes, so we can create a set of points by (a) creating matrix of x-y points, and (b) passing them to the SpatialPoints function to create our first SpatialPoints object:

toy.coordinates <- rbind(c(1.5, 2.00),
                          c(2.5, 2.00),
                          c(0.5, 0.50),
                          c(1.0, 0.25),
                          c(1.5, 0.00),
                          c(2.0, 0.00),
                          c(2.5, 0.00),
                          c(3.0, 0.25),
                          c(3.5, 0.50))

toy.coordinates
      [,1] [,2]
 [1,]  1.5 2.00
 [2,]  2.5 2.00
 [3,]  0.5 0.50
 [4,]  1.0 0.25
 [5,]  1.5 0.00
 [6,]  2.0 0.00
 [7,]  2.5 0.00
 [8,]  3.0 0.25
 [9,]  3.5 0.50
my.first.points <- SpatialPoints(toy.coordinates) # ..converted into a spatial object
plot(my.first.points)

To get a summary of how R sees these points, we can ask it for summary information in a couple different ways. Here’s a summary of available commands:

summary(my.first.points)
Object of class SpatialPoints
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: NA 
proj4string : [NA]
Number of points: 9
coordinates(my.first.points)
      coords.x1 coords.x2
 [1,]       1.5      2.00
 [2,]       2.5      2.00
 [3,]       0.5      0.50
 [4,]       1.0      0.25
 [5,]       1.5      0.00
 [6,]       2.0      0.00
 [7,]       2.5      0.00
 [8,]       3.0      0.25
 [9,]       3.5      0.50

Add a Coordinate Reference System (CRS)

Unlike a simple geometric object, a SpatialPoints object has the ability to keep track of how the coordinates of its points relate to places in the real world through an associated “Coordinate Reference System” (CRS – the combination of a geographic coordinate system and possibly a projection), which is stored using a code called a proj4string. The proj4string is so important to a SpatialPoints object, that it’s presented right in the summary:

summary(my.first.points)
Object of class SpatialPoints
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: NA 
proj4string : [NA]
Number of points: 9

In this case, however, while our SpatialPoints object clearly knows what a CRS is, the Spatial object we just created does not have a projection or geographic coordinate system defined. It is ok to plot, but be aware that for many meaningful spatial operations you will need to define a CRS.

CRS objects can be created by passing the CRS() function the code associated with a known projection. You can find the codes for most commonly used projections from www.spatialreference.org.

Note that the same CRS can often be called in many ways. For example, one of the most commonly used CRS is the WGS84 latitude-longitude projection. You can create a WGS84 lat-long projection object by either passing the reference code for the projection – CRS("+init=epsg:4326") – or by directly calling all its relevant parameters – CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"). Your choice of CRS ususally depends on when the data was collected, the geographic extent of the data, and the purpose of the data. The previous example – WGS84 (EPSG: 4326) Latitude/Longitude – is commonly used for data for the entire globe or many countries.

Here’s an illustration of assigning a CRS:

is.projected(my.first.points) # see if a projection is defined.
[1] NA
  # Returns `NA` if no geographic coordinate system or projection; returns FALSE if has geographic coordinate system but no projection.

crs.geo <- CRS("+init=epsg:32633")  # UTM 33N, UTM projection for these coordinates
proj4string(my.first.points) <- crs.geo  # define projection system of our data
is.projected(my.first.points)
[1] TRUE
summary(my.first.points)
Object of class SpatialPoints
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: TRUE 
proj4string :
[+init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
+ellps=WGS84 +towgs84=0,0,0]
Number of points: 9

When CRS is called with only an EPSG code, R will try to complete the CRS with the parameters looked up in the EPSG table.

Geometry-only objects (objects without attributes) can be subsetted similar to how vectors or lists are subsetted; we can select the first two points by

my.first.points[1:2]
class       : SpatialPoints 
features    : 2 
extent      : 1.5, 2.5, 2, 2  (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 

Add Attributes

Moving from a SpatialPoints to a SpatialPointsDataFrame occurs when you add a data.frame of attributes to the points. Let’s just add an arbitrary table to the data – this will label each point with a letter and a number. Note points will merge with the data.frame based on the order of observations.

df <- tibble(attr1 = c('a','b','z','d','e','q','w','r','z'), attr2 = c(101:109))
df
# A tibble: 9 x 2
  attr1 attr2
  <chr> <int>
1     a   101
2     b   102
3     z   103
4     d   104
5     e   105
6     q   106
7     w   107
8     r   108
9     z   109
my.first.spdf <- SpatialPointsDataFrame(my.first.points, df)
summary(my.first.spdf)
Object of class SpatialPointsDataFrame
Coordinates:
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: TRUE 
proj4string :
[+init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
+ellps=WGS84 +towgs84=0,0,0]
Number of points: 9
Data attributes:
    attr1               attr2    
 Length:9           Min.   :101  
 Class :character   1st Qu.:103  
 Mode  :character   Median :105  
                    Mean   :105  
                    3rd Qu.:107  
                    Max.   :109  

Now that we have attributes, we can also subset our data the same way we would subset a data.frame. Some subsetting:

my.first.spdf[1:2, ]        # row 1 and 2 only
class       : SpatialPointsDataFrame 
features    : 2 
extent      : 1.5, 2.5, 2, 2  (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 2
names       : attr1, attr2 
min values  :     a,   101 
max values  :     b,   102 
my.first.spdf[1:2, "attr1"] # row 1 and 2 only, attr1
class       : SpatialPointsDataFrame 
features    : 2 
extent      : 1.5, 2.5, 2, 2  (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 1
names       : attr1 
min values  :     a 
max values  :     b 
plot(my.first.spdf[which(my.first.spdf$attr2 > 105),])    # select if attr2 > 5


SpatialPoint from a lon/lat table

A SpatialPointsDataFrame object can be created directly from a data.frame by specifying which columns contain the coordinates. This is interesting, for example if you have a spreadsheet that contains latitude, longitude and some values. You can create the object from the data frame in one step by using the coordinates() command. That automatically turns the dataframe object into a SpatialPointsDataFrame.

df_with_coords <- tibble(
  lon = toy.coordinates[,1],
  lat = toy.coordinates[,2],
  attr1 = c('a','b','z','d','e','q','w','r','z'),
  attr2 = c(101:109))

my.second.spdf <- SpatialPointsDataFrame(
  coordinates(select(df_with_coords, lon, lat)), select(df_with_coords, -(lon:lat)))
summary(my.second.spdf)
Object of class SpatialPointsDataFrame
Coordinates:
    min max
lon 0.5 3.5
lat 0.0 2.0
Is projected: NA 
proj4string : [NA]
Number of points: 9
Data attributes:
    attr1               attr2    
 Length:9           Min.   :101  
 Class :character   1st Qu.:103  
 Mode  :character   Median :105  
                    Mean   :105  
                    3rd Qu.:107  
                    Max.   :109  

1.1.2 SpatialPolygons

SpatialPolygons are very, very common (think administrative borders, countries, etc.), so they’re important to get used to.

Building up a SpatialPolygons from scratch.

SpatialPolygons are a little more complicated than SpatialPoints. With SpatialPoints, we moved directly from x-y coordinates to a SpatialPoints object.

To get a SpatialPolygons object, we have to build it up by (a) creating Polygon objects, (b) combining those into Polygons objects (note the “s” at the end), and finally (c) combining those to create SpatialPolygons. So what are these components?

  • A Polygon object is a single geometric shape (e.g. a square, rectangle, etc.) defined by a single uninterrupted line around the exterior of the shape.
  • A Polygons object consists of one or more simple geometric objects (Polygon objects) that combine to form what you think of as a single unit of analysis (an “observation”). For example, each island in Hawaii would be a Polygon, but Hawaii itself is the Polygons consisting of all the individual island Polygon objects.
  • A SpatialPolygons object is a collection of Polygons objects, where each Polygons object is an “observation”. For example, if we wanted a map of US states, we would make a Polygons object for each state, then combine them all into a single SpatialPolygons. If you’re familiar with shapefiles, SpatialPolygons is basically the R analogue of a shapefile or layer.

One special note: if you want to put a hole in a polygon (e.g. drawing a donut, or if you wanted to draw South Africa and leave a hole in the middle for Lesotho) you do so by (a) creating a Polygon object for the outline, (b) creating a second Polygon object for the hole and passing the argument hole=TRUE, and (c) combine the two into a Polygons object.

Let’s try building up a SpatialPolygon!

# create polyon objects from coordinates.
# Each object is a single geometric polygon defined by a bounding line.
house1.building <-  Polygon(rbind(c(1, 1),
                                  c(2, 1),
                                  c(2, 0),
                                  c(1, 0)))

house1.roof <- Polygon(rbind(c(1.0, 1),
                             c(1.5, 2),
                             c(2.0, 1)))

house2.building <-  Polygon(rbind(c(3, 1),
                                  c(4, 1),
                                    c(4, 0),
                                    c(3, 0)))

house2.roof <- Polygon(rbind(c(3.0, 1),
                             c(3.5, 2),
                             c(4.0, 1)))

house2.door <-  Polygon(rbind(c(3.25,0.75),
                              c(3.75,0.75),
                              c(3.75,0.00),
                              c(3.25,0.00)),
                              hole=TRUE)

# create lists of polygon objects from polygon objects and unique ID
# A `Polygons` is like a single observation.
h1 <-  Polygons(list(house1.building, house1.roof), "house1")
h2 <-  Polygons(list(house2.building, house2.roof, house2.door), "house2")

# create spatial polygons object from lists
# A SpatialPolygons is like a shapefile or layer.
houses <-  SpatialPolygons(list(h1,h2))
plot(houses)

Adding Attributes to SpatialPolygon

As with SpatialPoints, we can associated a data.frame with SpatialPolygons. There are two things that are important about doing this with SpatialPolygons:

  1. When you first associate a data.frame with a SpatialPolygons object, R will line up rows and polygons by matching Polygons object names with the data.frame row.names.
  2. After the initial association, this relationship is NO LONGER based on row.names! For the rest of the SpatialPolygonsDataFrame’s life, the association between Polygons and rows of your data.frame is based on the order of rows in your data.frame, so don’t try to change the order of the data.frame by hand!

Make attributes and plot. Note how the door – which we created with hole=TRUE is empty!

attr <- data.frame(attr1=1:2, attr2=6:5, row.names=c("house2", "house1"))
houses.DF <- SpatialPolygonsDataFrame(houses, attr)
as.data.frame(houses.DF)      # Notice the rows were re-ordered!
       attr1 attr2
house1     2     5
house2     1     6
spplot(houses.DF)

Adding a Coordinate Reference Systems (CRS)

As with SpatialPoints, a SpatialPolygons object on Earth doesn’t actually know where it until you set its Coordinate Reference System, which you can do the same way you did with the SpatialPoints objects:

crs.geo <- CRS("+init=EPSG:4326")  # geographical, datum WGS84
proj4string(houses.DF) <- crs.geo  # define projection system of our data

1.1.3 SpatialLines: Just like SpatialPolygons

SpatialLines objects are basically like SpatialPolygons, except they’re built up using Line objects (each a single continuous line, like each branch of a river), Lines objects (collection of lines that form a single unit of analysis, like all the parts of a river), to SpatialLines (collection of “observations”, like rivers).

1.1.4 Recap of Spatial* Objects

Here’s a quick summary of the construction flow of SpatialObjects:

DEM reproject

DEM reproject

Turns out Spatial* objects are just a collection of things that you can see using the str command:

str(my.first.spdf)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :Classes 'tbl_df', 'tbl' and 'data.frame': 9 obs. of  2 variables:
  .. ..$ attr1: chr [1:9] "a" "b" "z" "d" ...
  .. ..$ attr2: int [1:9] 101 102 103 104 105 106 107 108 109
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:9, 1:2] 1.5 2.5 0.5 1 1.5 2 2.5 3 3.5 2 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  ..@ bbox       : num [1:2, 1:2] 0.5 0 3.5 2
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"

What this shows is that my.first.spdf is actually a collection of different things – data, coords, bbox, and a proj4string. In the same way we can get many of these things with commands, we can also call them with the @ command!

bbox(my.first.spdf)
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
my.first.spdf@bbox
          min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
coordinates(my.first.spdf)
      coords.x1 coords.x2
 [1,]       1.5      2.00
 [2,]       2.5      2.00
 [3,]       0.5      0.50
 [4,]       1.0      0.25
 [5,]       1.5      0.00
 [6,]       2.0      0.00
 [7,]       2.5      0.00
 [8,]       3.0      0.25
 [9,]       3.5      0.50
my.first.spdf@coords
      coords.x1 coords.x2
 [1,]       1.5      2.00
 [2,]       2.5      2.00
 [3,]       0.5      0.50
 [4,]       1.0      0.25
 [5,]       1.5      0.00
 [6,]       2.0      0.00
 [7,]       2.5      0.00
 [8,]       3.0      0.25
 [9,]       3.5      0.50
as.data.frame(my.first.spdf)
  attr1 attr2 coords.x1 coords.x2
1     a   101       1.5      2.00
2     b   102       2.5      2.00
3     z   103       0.5      0.50
4     d   104       1.0      0.25
5     e   105       1.5      0.00
6     q   106       2.0      0.00
7     w   107       2.5      0.00
8     r   108       3.0      0.25
9     z   109       3.5      0.50
my.first.spdf@data
# A tibble: 9 x 2
  attr1 attr2
  <chr> <int>
1     a   101
2     b   102
3     z   103
4     d   104
5     e   105
6     q   106
7     w   107
8     r   108
9     z   109

In R, the items with @ at the start of the line are called “slots”. If you are used to working in another object-oriented language like Java or Python, these are analogous to attributes. You can access each of these “slots” using the @ operator, making them equivalent of some function calls. It is a general recommendation to not modify slots by directly assigning values to them unless you know very well what you do.

1.2 Raster Data

Rasters are much more compact than vectors. Because of their regular structure the coordinates do not need to be recorded for each pixel or cell in the rectangular extent. A raster has a CRS, an origin, a distance or cell size in each direction, a dimension in terms of numbers of cells, and an array of values. If necessary, the coordinates for any cell can be computed.

Note that the sp library used for vector data does have some basic tools for manipulating raster data. However, the sp library has largely been replaced by the raster library we will use here, and anything one can do with the sp library can also be done with the raster library.

1.2.1 Creating Raster Data From Scratch

A raster dataset has three primary components:

  • A grid, which consists of:
    • dimensions (number of rows and columns),
    • resolution (size of sides of each cell),
    • and extent (where the edges of the grid “are”)
  • A set of values associated with each cell in the grid
  • Projection data about how the grid relates to the physical world

It’s relatively easy to start a raster object by just defining the grid:

basic_raster <- raster(ncol=5, nrow=10, xmn=0, xmx=5, ymn=0, ymx=10)
basic_raster
class       : RasterLayer 
dimensions  : 10, 5, 50  (nrow, ncol, ncell)
resolution  : 1, 1  (x, y)
extent      : 0, 5, 0, 10  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 

However, note that this raster has a grid, but no data:

hasValues(basic_raster)
[1] FALSE

We add data to the raster object using the values function:

values(basic_raster) <-  1:50  # Note 50 is the total number of cells in the grid.
plot(basic_raster)

Note even though a grid is a 2-dimensional object, raster looks for data that is one-dimensional, then assigns the values in the DataFrame by (a) starting in the top left cell, then (b) moving across the row from left to right, and finally (c) moving down a row and repeating the whole process. Thus each column must be the length of the total number of cells.

Defining projection

To define a projection, we use the same proj4 strings as vector data, but without the intermediate step of creating a CRS object:

projection(basic_raster) <- "+init=EPSG:4326"

2. Importing and Exporting Spatial Data

  • use the rgdal package for Vector data
  • use the raster package for Raster data

2.1 Importing and Exporting Vector Data

Normally we do not create Spatial* objects by hand. It is much more common to work with existing data read from external sources like shapefiles, or databases.

In order to read those into R and turn them into Spatial* family objects we rely on the rgdal package. It provides us direct access to the powerful GDAL library from within R.

We can read in and write out spatial data using:

readOGR() and writeOGR()

The parameters provided for each function varies depending on the exact spatial file type you are reading. We will take the ESRI shapefile as an example. A shapefile consists of various files, and R expects all those files to be in one directory.

When reading in a shapefile, readOGR() expects at least the following two arguments:

datasource name (dsn)  # the path to the folder that contains the files
                       # Note that this is a path to the folder
layer name (layer)     # the file name without extension
                       # Note that this is not a path but just the name of the file

For example, if I have a shapefile called sf_incomes.shp and all its associated files (like .dbf, .prj, .shx) in a directory called RGIS1_Data/shapefiles/ in my project folder, my command to read this shapefile would look like this:

my_shapefile <- readOGR(dsn = "RGIS1_Data/shapefiles/", layer = "sf_incomes")
OGR data source with driver: ESRI Shapefile 
Source: "RGIS1_Data/shapefiles/", layer: "sf_incomes"
with 182 features
It has 3 fields
Integer64 fields read as strings:  Trc2000 
plot(my_shapefile)

Note you may run into trouble if you set your working directory to the folder holding the shapefile as ‘readOGR()’ doesn’t like it if the first argument is an empty string.

2.2 Reading Raster data from files

The raster library can also read many file types on it’s own. For example, let’s load SF Elevation data.

my_raster_file <- raster("RGIS1_Data/sanfrancisconorth.dem")
plot(my_raster_file)

3. Geo-Coding

This section provides an overview of tools for geocoding – converting addresses or names of locations into latitudes and longitudes – using the google maps API.

3.1 Geo-Coding with Google Maps

Google offers a service that allows users to submit requests for the latitudes and longitudes associated with different addresses or place names from within R and to get back results that are easy to work within R. This service is called a google geocoding API.

Basically, the google maps API will accept any query you could type into Google Maps and returns information on Google’s best guess for the latitude and longitude associated with your query. The tool for doing this from within R is found int he ggmap library, and the basic syntax is as follows:

library(ggmap)

addresses <- c("1600 Pennsylvania NW, Washington, DC", "denver, co")

locations <- geocode(addresses, source="google", output = "more")
locations
         lon      lat          type     loctype
1  -77.03657 38.89766 establishment     rooftop
2 -104.99025 39.73924      locality approximate
                                              address    north    south
1 1600 pennsylvania ave nw, washington, dc 20500, usa 38.89896 38.89626
2                                     denver, co, usa 39.91425 39.61443
        east       west                         route street_number
1  -77.03539  -77.03808 Pennsylvania Avenue Northwest          1600
2 -104.60030 -105.10993                          <NA>          <NA>
          neighborhood   locality administrative_area_level_1
1 Northwest Washington Washington        District of Columbia
2                 <NA>     Denver                    Colorado
        country postal_code administrative_area_level_2
1 United States       20500                        <NA>
2 United States        <NA>               Denver County

Note the output option can be set to “latlon”, “latlona”, “more”, or “all” depending on how much information you want back. I would STRONGLY recommend always using the output="more" option so that you get information on how certain google is about its guess!

3.2 Interpreting geocode results

Geocoding results include two fields that are very important to understand: loctype and type.

type

Google thinks of the world as containing a number of different types of locations. Some are points (like houses), while others are areas (like cities). The type field tells you if google is giving you the location of a house, or just the centroid of a city. A full list of different types is available from google here, but the most common results (in my experience) are:

  • street_address: indicates a precise street address
  • locality: indicates an incorporated city or town political entity
  • point_of_interest: indicates a named point of interest. Typically, these “POI”s are prominent local entities that don’t easily fit in another category, such as “Empire State Building” or “Statue of Liberty.”
  • administrative_area_level_[SOME NUMBER]: these “civil entities”, where 0 is the country, 1 is the first administrative level below that (in the US, states), 2 is below that (in the US, counties), etc.

This is important because if you get a locality or administrative area, the latitude and longitude you get it just the centroid of the locality, and you should interpret it as such!

loctype

The loctype column provides similar but distinct information to type, including more information about street_address results. In order of precision, the possible values for this field are:

  • “ROOFTOP” indicates that the returned result is a precise geocode for which we have location information accurate down to street address precision.
  • “RANGE_INTERPOLATED” indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
  • “GEOMETRIC_CENTER” indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
  • “APPROXIMATE” indicates that the returned result is approximate. localities and admin areas have this value!

3.3 Query Limits

Google limits individual (free) users to 2,500 queries per day and 10 queries per second. You can see how many queries you have remaining by typing geocodeQueryCheck(). You can also buy additional requests (up to 100,000 a day) for $0.50 / 1000 requests, and there is a paid subscription service that will provide up to 100,000 requests a day.

Because of this, it is important that you not test your code on your entire dataset or you’ll waste your queries when you’re debugging your code!

5 Making Maps

A Motivating Example Using ggplot2

library(tidyverse)
library(sp)
library(rgdal)
library(rgeos)
library(raster)
library(viridis)
library(gtable)
library(grid)


# load data: average age in CH
data <- read.csv("input_grossenbacher/avg_age_15.csv", stringsAsFactors = F)

# load shapefile
gde_15 <- readOGR("input_grossenbacher/geodata/gde-1-1-15.shp", layer = "gde-1-1-15")
OGR data source with driver: ESRI Shapefile 
Source: "input_grossenbacher/geodata/gde-1-1-15.shp", layer: "gde-1-1-15"
with 2324 features
It has 2 fields
Integer64 fields read as strings:  BFS_ID 
# project it
# set crs to ch1903/lv03, just to make sure  (EPSG:21781)
crs(gde_15) <- "+proj=somerc +lat_0=46.95240555555556
+lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000
+ellps=bessel +towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs"
# fortify, i.e., make ggplot2-compatible

# fortify SpatialPolygonsDataFrame to read it in ggplot
map_data_fortified <- fortify(gde_15, region = "BFS_ID") %>%
  mutate(id = as.numeric(id))


# join fortified SpatialPolygonsDataFrame with data
map_data <- map_data_fortified %>% left_join(data, by = c("id" = "bfs_id"))


# load  municipalities shapefile
gde_15_political <- readOGR("input_grossenbacher/geodata/g1g15.shp", layer = "g1g15")
OGR data source with driver: ESRI Shapefile 
Source: "input_grossenbacher/geodata/g1g15.shp", layer: "g1g15"
with 2328 features
It has 20 fields
# project it
crs(gde_15_political) <- "+proj=somerc +lat_0=46.95240555555556
+lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000
+ellps=bessel +towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs"

# fortify SpatialPolygonsDataFrame to read it in ggplot
map_data_political_fortified <- fortify(gde_15_political, region = "GMDNR") %>%
  mutate(id = as.numeric(id))

# join fortified SpatialPolygonsDataFrame with data
map_data_political <- map_data_political_fortified %>% left_join(data, by = c("id" = "bfs_id"))
map_data_political <- map_data_political[complete.cases(map_data_political),]


# read in background relief raster
relief <- raster("input_grossenbacher/geodata/02-relief-georef-clipped-resampled.tif")
relief_spdf <- as(relief, "SpatialPixelsDataFrame")
# relief is converted to a very simple data frame,
# just as the fortified municipalities.
# for that we need to convert it to a
# SpatialPixelsDataFrame first, and then extract its contents
# using as.data.frame
relief <- as.data.frame(relief_spdf) %>%
  rename(value = `X02.relief.georef.clipped.resampled`)
# remove unnecessary variables
rm(relief_spdf)
rm(gde_15)
rm(map_data_fortified)
rm(map_data_political_fortified)


# customize theme
theme_map <- function(...) {
  theme_minimal() +
  theme(
    text = element_text(family = "Avenir Next Medium", color = "#22211d"),
    axis.line = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    # panel.grid.minor = element_line(color = "#ebebe5", size = 0.2),
    panel.grid.major = element_line(color = "#ebebe5", size = 0.2),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.background = element_rect(fill = "#f5f5f2", color = NA),
    legend.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.border = element_blank(),
    ...
  )
}

# customize legend
extendLegendWithExtremes <- function(p){
  p_grob <- ggplotGrob(p)
  legend <- gtable_filter(p_grob, "guide-box")
  legend_grobs <- legend$grobs[[1]]$grobs[[1]]
  # grab the first key of legend
  legend_first_key <- gtable_filter(legend_grobs, "key-3-1-1")
  legend_first_key$widths <- unit(2, units = "cm")
  # modify its width and x properties to make it longer
  legend_first_key$grobs[[1]]$width <- unit(2, units = "cm")
  legend_first_key$grobs[[1]]$x <- unit(0.15, units = "cm")

  # last key of legend
  legend_last_key <- gtable_filter(legend_grobs, "key-3-6-1")
  legend_last_key$widths <- unit(2, units = "cm")
  # analogous
  legend_last_key$grobs[[1]]$width <- unit(2, units = "cm")
  legend_last_key$grobs[[1]]$x <- unit(1.02, units = "cm")

  # grab the last label so we can also shift its position
  legend_last_label <- gtable_filter(legend_grobs, "label-5-6")
  legend_last_label$grobs[[1]]$x <- unit(2, units = "cm")

  # Insert new color legend back into the combined legend
  legend_grobs$grobs[legend_grobs$layout$name == "key-3-1-1"][[1]] <-
    legend_first_key$grobs[[1]]
  legend_grobs$grobs[legend_grobs$layout$name == "key-3-6-1"][[1]] <-
    legend_last_key$grobs[[1]]
  legend_grobs$grobs[legend_grobs$layout$name == "label-5-6"][[1]] <-
    legend_last_label$grobs[[1]]

  # finally, I need to create a new label for the minimum value
  new_first_label <- legend_last_label$grobs[[1]]
  new_first_label$label <- round(min(map_data$avg_age_15, na.rm = T), 2)
  new_first_label$x <- unit(-0.15, units = "cm")
  new_first_label$hjust <- 1

  legend_grobs <- gtable_add_grob(legend_grobs,
                                  new_first_label,
                                  t = 6,
                                  l = 2,
                                  name = "label-5-0",
                                  clip = "off")
  legend$grobs[[1]]$grobs[1][[1]] <- legend_grobs
  p_grob$grobs[p_grob$layout$name == "guide-box"][[1]] <- legend

  # the plot is now drawn using this grid function
  grid.newpage()
  grid.draw(p_grob)
}


# same code as above but different breaks
pretty_breaks <- c(40,42,44,46,48)
# find the extremes
minVal <- min(map_data$avg_age_15, na.rm = T)
maxVal <- max(map_data$avg_age_15, na.rm = T)
# compute labels
labels <- c()
brks <- c(minVal, pretty_breaks, maxVal)
# round the labels (actually, only the extremes)
for(idx in 1:length(brks)){
  labels <- c(labels,round(brks[idx + 1], 2))
}

labels <- labels[1:length(labels)-1]
# define a new variable on the data set just as above
map_data$brks <- cut(map_data$avg_age_15,
                     breaks = brks,
                     include.lowest = TRUE,
                     labels = labels)

brks_scale <- levels(map_data$brks)
labels_scale <- rev(brks_scale)


# plot it
p <- ggplot() +
    # municipality polygons
    geom_raster(data = relief, aes_string(x = "x",
                                          y = "y",
                                          alpha = "value")) +
    scale_alpha(name = "", range = c(0.6, 0), guide = F)  +
    geom_polygon(data = map_data, aes(fill = brks,
                                      x = long,
                                      y = lat,
                                      group = group)) +
    # municipality outline
    geom_path(data = map_data, aes(x = long,
                                   y = lat,
                                   group = group),
              color = "white", size = 0.1) +
    coord_equal() +
    theme_map() +
    theme(
      legend.position = c(0.5, 0.03),
      legend.text.align = 0,
      legend.background = element_rect(fill = alpha('white', 0.0)),
      legend.text = element_text(size = 7, hjust = 0, color = "#4e4d47"),
      plot.title = element_text(hjust = 0.5, color = "#4e4d47"),
      plot.subtitle = element_text(hjust = 0.5, color = "#4e4d47",
                                   margin = margin(b = -0.1,
                                                   t = -0.1,
                                                   l = 2,
                                                   unit = "cm"),
                                   debug = F),
      legend.title = element_text(size = 8),
      plot.margin = unit(c(.5,.5,.2,.5), "cm"),
      panel.spacing = unit(c(-.1,0.2,.2,0.2), "cm"),
      panel.border = element_blank(),
      plot.caption = element_text(size = 6,
                                  hjust = 0.92,
                                  margin = margin(t = 0.2,
                                                  b = 0,
                                                  unit = "cm"),
                                  color = "#939184")
    ) +
    labs(x = NULL,
         y = NULL,
         title = "Switzerland's regional demographics",
         subtitle = "Average age in Swiss municipalities, 2015",
         caption = "Map CC-BY-SA; Author: Timo Grossenbacher (@grssnbchr), Geometries: ThemaKart, BFS; Data: BFS, 2016; Relief: swisstopo, 2016") +
    scale_fill_manual(
      values = rev(magma(8, alpha = 0.8)[2:7]),
      breaks = rev(brks_scale),
      name = "Average age",
      drop = FALSE,
      labels = labels_scale,
      guide = guide_legend(
        direction = "horizontal",
        keyheight = unit(2, units = "mm"),
        keywidth = unit(70/length(labels), units = "mm"),
        title.position = 'top',
        title.hjust = 0.5,
        label.hjust = 1,
        nrow = 1,
        byrow = T,
        reverse = T,
        label.position = "bottom"
      )
    )
extendLegendWithExtremes(p)

library(rgdal)
library(ggplot2)
palo_alto <- readOGR("RGIS3_Data/palo_alto", "palo_alto")
OGR data source with driver: ESRI Shapefile 
Source: "RGIS3_Data/palo_alto", layer: "palo_alto"
with 371 features
It has 5 fields
# create a unique ID for the later join
palo_alto$id = rownames(as.data.frame(palo_alto))


# turn SpatialPolygonsDataframe into a data frame using fortify
palo_alto.pts <- fortify(palo_alto, region="id") #this only has the coordinates
palo_alto.df <- merge(palo_alto.pts, palo_alto, by="id", type='left') # add the attributes back


# calculate even breaks
palo_alto.df$qt <- cut(palo_alto.df$PrCpInc, 6)


# plot  
ggplot(palo_alto.df, aes(long, lat, group = group, fill = qt)) + 
  geom_polygon() + 
  scale_fill_brewer("Per Cap Income", palette = "OrRd", labels = c('0 to 26,800', 
                                 '26,800 to 43,900', 
                                 '43,900 to 60,900',
                                 '60,900 to 78,000', 
                                 '78,000 to 95,100', 
                                 '95,100 to 112,000')) + 
  theme(line = element_blank(),  
        axis.text=element_blank(),
        axis.title=element_blank(),
        panel.background = element_blank()) +
  coord_equal()

tmap

library(tmap)
tm_shape(palo_alto) +
  tm_fill("PrCpInc", palette = 'OrRd')

ggmap

library(ggmap)
get_map('university zurich') %>% ggmap

get_map('university zurich', zoom = 15) %>% ggmap

europe <- c(left = -12, bottom = 35, right = 30, top = 63)
get_map(europe, zoom = 5) %>% ggmap

get_stamenmap(europe, zoom = 5, maptype = "toner-lite") %>% ggmap()

my_data <- tibble(
  cities = c('Barcelona', 'London', 'Dublin', 'Zürich', 'Paris', 'Berlin'),
  values = c(50, 20, 70, 10, 40, 90)
)

locations <- geocode(my_data$cities, source = 'google', output = 'more')

left_join(my_data, locations, by = c('cities' = 'locality')) %>% 
  select(cities:lat) -> my_data

get_stamenmap(europe, zoom = 5, maptype = "toner-lite") %>% 
  ggmap() +
  geom_point(data = my_data, aes(x = lon, y = lat, size = values), alpha = 0.5)

6 The sf package

In a few words, here’s what is interesting about sf:

The trick of sf is that spatial is not that special anymore: it describes spatial attributes as geometries, which is just another attribute of the dataset.

What was SpatialPoints, SpatialPolgons, etc before will now be: MULTIPOINT, MULTIPOLYGON, etc.

library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.4/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
str(nc)
Classes 'sf' and 'data.frame':  100 obs. of  15 variables:
 $ AREA     : num  0.114 0.061 0.143 0.07 0.153 0.097 0.062 0.091 0.118 0.124 ...
 $ PERIMETER: num  1.44 1.23 1.63 2.97 2.21 ...
 $ CNTY_    : num  1825 1827 1828 1831 1832 ...
 $ CNTY_ID  : num  1825 1827 1828 1831 1832 ...
 $ NAME     : Factor w/ 100 levels "Alamance","Alexander",..: 5 3 86 27 66 46 15 37 93 85 ...
 $ FIPS     : Factor w/ 100 levels "37001","37003",..: 5 3 86 27 66 46 15 37 93 85 ...
 $ FIPSNO   : num  37009 37005 37171 37053 37131 ...
 $ CRESS_ID : int  5 3 86 27 66 46 15 37 93 85 ...
 $ BIR74    : num  1091 487 3188 508 1421 ...
 $ SID74    : num  1 0 5 1 9 7 0 0 4 1 ...
 $ NWBIR74  : num  10 10 208 123 1066 ...
 $ BIR79    : num  1364 542 3616 830 1606 ...
 $ SID79    : num  0 3 6 2 3 5 2 2 2 5 ...
 $ NWBIR79  : num  19 12 260 145 1197 ...
 $ geometry :sfc_MULTIPOLYGON of length 100; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:27, 1:2] -81.5 -81.5 -81.6 -81.6 -81.7 ...
  ..- attr(*, "class")= chr  "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr  "AREA" "PERIMETER" "CNTY_" "CNTY_ID" ...
print(nc[9:15], n = 3)
Simple feature collection with 100 features and 6 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
First 3 features:
  BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
1  1091     1      10  1364     0      19 MULTIPOLYGON (((-81.4727554...
2   487     0      10   542     3      12 MULTIPOLYGON (((-81.2398910...
3  3188     5     208  3616     6     260 MULTIPOLYGON (((-80.4563446...

The functions provided by sf are prefixed by st_. That makes it easy to search for them on the command line too.

6.1 Loading Spatial Data

You can load spatial data from txt or csv files using the familiar dplyr commands,

df <- read_csv('./path/to/my/file.csv')

Let’s take the example dataset meuse from the sp package. meuse is a data.frame — similar to what could have been read from a CSV file using read_csv:

data('meuse', package = "sp")
head(meuse)
       x      y cadmium copper lead zinc  elev       dist   om ffreq soil
1 181072 333611    11.7     85  299 1022 7.909 0.00135803 13.6     1    1
2 181025 333558     8.6     81  277 1141 6.983 0.01222430 14.0     1    1
3 181165 333537     6.5     68  199  640 7.800 0.10302900 13.0     1    1
4 181298 333484     2.6     81  116  257 7.655 0.19009400  8.0     1    2
5 181307 333330     2.8     48  117  269 7.480 0.27709000  8.7     1    2
  lime landuse dist.m
1    1      Ah     50
2    1      Ah     30
3    1      Ah    150
4    0      Ga    270
5    0      Ah    380
 [ reached getOption("max.print") -- omitted 1 row ]
# what was 
# SpatialPointsDataFrame(
#    coordinates(select(meuse, x, y)), select(meuse, -(x:y)))

ms <- st_as_sf(
  meuse,
  coords = c('x', 'y'),
  crs = "+init=epsg:28992"
)
ms
Simple feature collection with 155 features and 12 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 178605 ymin: 329714 xmax: 181390 ymax: 333611
epsg (SRID):    28992
proj4string:    +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.4171,50.3319,465.5524,-0.398957,0.343988,-1.8774,4.0725 +units=m +no_defs
First 75 features:
   cadmium copper lead zinc  elev       dist   om ffreq soil lime landuse
1     11.7     85  299 1022 7.909 0.00135803 13.6     1    1    1      Ah
2      8.6     81  277 1141 6.983 0.01222430 14.0     1    1    1      Ah
3      6.5     68  199  640 7.800 0.10302900 13.0     1    1    1      Ah
4      2.6     81  116  257 7.655 0.19009400  8.0     1    2    0      Ga
5      2.8     48  117  269 7.480 0.27709000  8.7     1    2    0      Ah
   dist.m              geometry
1      50 POINT (181072 333611)
2      30 POINT (181025 333558)
3     150 POINT (181165 333537)
4     270 POINT (181298 333484)
5     380 POINT (181307 333330)
 [ reached getOption("max.print") -- omitted 70 rows ]

There is a simple plotting funtion included in sf. It is very similar to the old sp::spplot:

plot(ms)

# what would have been before readOGR()
file_name <- system.file("shape/nc.shp", package = "sf")
nc <- st_read(file_name)
Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.4/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
print(nc)
Simple feature collection with 100 features and 14 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
First 75 features:
    AREA PERIMETER CNTY_ CNTY_ID         NAME  FIPS FIPSNO CRESS_ID BIR74
1  0.114     1.442  1825    1825         Ashe 37009  37009        5  1091
2  0.061     1.231  1827    1827    Alleghany 37005  37005        3   487
3  0.143     1.630  1828    1828        Surry 37171  37171       86  3188
4  0.070     2.968  1831    1831    Currituck 37053  37053       27   508
5  0.153     2.206  1832    1832  Northampton 37131  37131       66  1421
   SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
1      1      10  1364     0      19 MULTIPOLYGON (((-81.4727554...
2      0      10   542     3      12 MULTIPOLYGON (((-81.2398910...
3      5     208  3616     6     260 MULTIPOLYGON (((-80.4563446...
4      1     123   830     2     145 MULTIPOLYGON (((-76.0089721...
5      9    1066  1606     3    1197 MULTIPOLYGON (((-77.2176666...
 [ reached getOption("max.print") -- omitted 70 rows ]
plot(nc)

plot(nc['AREA'])

6.2 Writing Spatial Data

st_write(nc, "output/nc.shp")
Writing layer `nc' to data source `output/nc.shp' using driver `ESRI Shapefile'
features:       100
fields:         14
geometry type:  Multi Polygon
st_write(nc, "output/nc.shp", delete_layer = TRUE) # overrides an existing file
Deleting layer `nc' using driver `ESRI Shapefile'
Writing layer `nc' to data source `/Users/econspare/Dropbox/teaching/2017 ProgrammingCamp/20-r-gis/output/nc.shp' using driver `ESRI Shapefile'
features:       100
fields:         14
geometry type:  Multi Polygon
write_sf(nc, "output/nc.shp") # same as no.2 with quiet = TRUE, delete_layer = TRUE

6.3 Merging Spatial with Non-Spatial Dataframes

non_spatial_df <- data.frame(
  CNTY_ID = nc$CNTY_ID,
  my_data <- runif(nrow(nc))
)

left_join(nc, non_spatial_df)
Simple feature collection with 100 features and 15 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
First 75 features:
    AREA PERIMETER CNTY_ CNTY_ID         NAME  FIPS FIPSNO CRESS_ID BIR74
1  0.114     1.442  1825    1825         Ashe 37009  37009        5  1091
2  0.061     1.231  1827    1827    Alleghany 37005  37005        3   487
3  0.143     1.630  1828    1828        Surry 37171  37171       86  3188
4  0.070     2.968  1831    1831    Currituck 37053  37053       27   508
   SID74 NWBIR74 BIR79 SID79 NWBIR79 my_data....runif.nrow.nc..
1      1      10  1364     0      19                 0.30614580
2      0      10   542     3      12                 0.63743965
3      5     208  3616     6     260                 0.84719127
4      1     123   830     2     145                 0.07022662
                         geometry
1  MULTIPOLYGON (((-81.4727554...
2  MULTIPOLYGON (((-81.2398910...
3  MULTIPOLYGON (((-80.4563446...
4  MULTIPOLYGON (((-76.0089721...
 [ reached getOption("max.print") -- omitted 71 rows ]

6.4 Projection

st_crs(nc)
$epsg
[1] 4267

$proj4string
[1] "+proj=longlat +datum=NAD27 +no_defs"

attr(,"class")
[1] "crs"
st_is_longlat(nc)
[1] TRUE
nc_p <- st_transform(nc, crs = 32119)
st_crs(nc_p)
$epsg
[1] 32119

$proj4string
[1] "+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.22 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

attr(,"class")
[1] "crs"

6.5 ggplot2 compatible

# devtools::install_github('tidyverse/ggplot2', force = TRUE)
# install and restart your R session
library(sf)
library(ggplot2)

nc <- st_read(system.file("shape/nc.shp", package = 'sf'), quiet = TRUE)
ggplot(nc) +
  geom_sf(aes(fill = AREA))

library(viridis)
ggplot(nc) +
  geom_sf(aes(fill = AREA)) +
  scale_fill_viridis("Area") +
  ggtitle("Area of counties in North Carlolina") +
  theme_bw()

library(viridis)
ggplot(nc) +
  geom_sf(aes(fill = AREA)) +
  scale_fill_viridis("Area") +
  coord_sf(crs = st_crs(102003)) +
  ggtitle("Area of counties in North Carlolina (Albers Projection") +
  theme_bw()

Sources

The document heavily draws from the excellent tutorials by Nick Eubank.

Moreover, have a look at these cheatsheets that may help you with learning both GIS concepts and vocabulary: